(C) 2018 by Damir Cavar
Version: 1.0, January 2018
This is a tutorial related to the L665 course on Machine Learning for NLP focusing on Deep Learning, Spring 2018 at Indiana University.
The following material is based on Linear Algebra Review and Reference by Zico Kolter (updated by Chuong Do) from September 30, 2015. This means, many passages are literally copied, many are rewritten. I do not mark sections that are added or different. Consider this notebook a extended annotation of Kolter's (and Do's) notes. See also James E. Gentle (2017) Matrix Algebra: Theory, Computations and Applications in Statistics. Second edition. Springer. Another good resource is Philip N. Klein (2013) Coding the Matrix: Linear Algebra through Applications to Computer Science, Newtonian Press.
See for the introduction the Notebook "Linear Algebra".
In this section we present some basic definitions of matrix calculus and provide a few examples.
Suppose that $f : \mathbb{R}^{m\times n} \rightarrow \mathbb{R}$ is a function that takes as input a matrix $A$ of size $m \times n$ and returns a real value.
Then the gradient of $f$ (with respect to $A \in \mathbb{R}^{m\times n}$) is the matrix of partial derivatives, defined as:
...
See page 20.
In [ ]: